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ATTORNEY DOCKET NUMBER: 0402 
Express Mail Label No.: EU604545486US 
Dale of Deposit: January 13, 2004 

5 CRYSTAL STRUCTURE OF HUMAN g-GALACTOSIDASE 

Field oF the Invention 

This invention relates to the X-ray crystal structure of the human a-galactosidase 
glycoprotein. More specifically, the invention relates to crystallized compositions of human 
a-galactosidase and to crystallized complexes of human a-galactosidase and its catalytic 

w product a-galactose. The invention further relates to a computer programmed with the 
structure coordinates of the human a-galactosidase' s active site wherein said computer is 
capable of displaying a three-dimensional representation of that active site. The invention 
also relates to methods for rational drug design based on the structural data for human a- 
galactosidase provided on computer readable media, as analyzed on a computer system 

15 having suitable computer algorithms. 

Background of the Invention 

The lysosomal enzyme a-galactosidase (a-GAL or a-Gal A, E.C. 3.2.1.22, SEQ ID 
NO:2) 'catalyzes the removal of galactose from oligosaccharides, glycoproteins, and 

20 glycolipids during the catabolism of rnacromolecules (FIG. 5a). Deficiencies in lysosomal 
enzymes lead to the accumulation of substrates in the tissues, conditions known as lysosomal 
storage diseases. In humans, the absence of functional a-GAL leads to the accumulation of 
galactosylated substrates (primarily globotriaosylceramide, FIG. 5b) in the tissues, leading to 
Fabry disease, an X-linked recessive disorder first described in 1898 (Fabry, J. Arch. 

25 Dermatol. Syph. , 1898, 43:187) characterized by chronic pain, ocular opacities, liver and 
kidney impairment, skin lesions, vascular deterioration and/or cardiac deficiencies (Brady, R. 
0., et al., N. Engl. J. Med. , 1967, 276:1163-7; Desnick, R. J., et ah, In The Metabolic and 
Molecular Bases of Inherited Disease 8th edit. -Scriver, C. R., Beaudet, A. L., Sly, W. S. & 
Valle, D., eds.-, 2001, pp. 3733-3774. McGraw-Hill, New York). Recombinant human a- 

30 GAL has the ability to restore enzyme function in patients (Schiffmann, R., et al., JAMA , 
2001, 285:2743-9; Eng, C. M., et al., N. Engl. J. Med. t 2001, 345:9-16), and enzyme 
replacement therapy using oc-GAL was recently approved in the United States as a treatment 
for Fabry disease. a-GAL became the second recombinant protein approved for the treatment 
of a lysosomal storage disorder (after f$-glucosidase, a treatment for Gaucher disease - 
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Beutler, E. & Grabowski, G. A., 2001, Gaucher Disease. In The Metabolic and Molecular 
Bases of Inherited Disease 8th edit. -Scriver, C. R., Beaudet, A. L., Sly, W. S. & Valie, D., 
eds.-McGraw-Hill, New York), and a-GAL represents one of a small number of recombinant 
human proteins approved for the treatment of any disease. A second treatment for Fabry 

a disease (specific for the cardiac variant of the disease) uses galactose infusion, which 
presumably helps stabilize the mutant a-GAL protein (Frustaci, A., et al., N. Engl. J. Med., 
2001, 345:25-32). In addition to enzyme replacement therapy and galactose infusion, gene 
replacement therapy using the a-GAL gene shows potential as a treatment for Fabry disease 
(Park, J., et al., Proc Natl Acad Sci USA, 2003, 100:3450-4). 

w There are currently two recombinant glycoprotein products, REPLAGAL™ 

(Transkaryotic Therapies, Inc., Cambridge, MA) and FABRAZYME™ (Genzyme, Inc., 
Cambridge, MA), available for enzyme replacement therapy used in the treatment of Fabry 
disease (Schiffmann, R., et a!., JAMA , 2001, 285:2743-9; Eng, C. M., et al, N. Engl. J. Med., 
2001, 345:9-16). These two glycoproteins have identical amino acid sequences but are 

15 produced in different cell lines, resulting in different glycosylation at the N-linked 
carbohydrate attachment sites. REPLAGAL™ is produced in a genetically engineered human 
cell line, while FABRAZYME™ is produced in a Chinese hamster ovary (CHO) cell line. 
REPLAGAL™ contains a greater amount of complex carbohydrate while Fabrazyme 
contains a higher fraction of sialylated and phosphorylated carbohydrate (Lee, K., et al., 

20 Glycobiology, 2003, 13:305-13). Because the polypeptide sequence of the two glycoproteins 
is identical, these differences in carbohydrate composition are solely responsible for the 
differences in tissue distribution and dose response of the two enzyme replacement therapies. 

a-GAL has also attracted attention for its ability to convert human blood group 
antigens. Recombinant a-GAL has been used to convert blood of type B into blood of type 

25 O, the universal donor type (Zhu, A., et al., Arch. Biochem. Biophys., 1996, 327:324-9), a 
process currently in clinical trials. 

Because of its utility in the treatment of Fabry disease and as a reagent for converting 
human blood types, much effort has been put into the expression and purification of large 
amounts of human a-GAL. The endogenous enzyme has been purified from human placenta 

30 (Mayes, J. S. & Beutler, E., Biochim Biophys Acta, 1977, 484:408-16), liver cells (Dean, K. J. 
& Sweeley, C. C, J Biol Chem, 1979, 254:9994-10000), spleen cells and plasma (Bishop, D. 
F. & Desnick, R. J., J Biol Chem, 1981, 256:1307-16), and fibroblasts (Lemansky, P., et al., J 
Biol Chem, 1987, 262:2062-5); recombinant enzyme has been produced in E. coli bacterial 
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cells (Hantzopoulos, P. A. & Calhoun, D. H., Gene, 1987, 57:159-69), COS monkey cells 
(Tsuji, S., et al., Eur J Biochem, 1987, 165:275-80), CHO cells (Ioannou, Y. A., et al., J Cell 
Biol 1992, 119:1137-50), baculovirus-infected Sf9 insect cells (Coppola, G., et al., Gene, 
1994, 144:197-203; Chen, Y., et al.. Protein Expr Purif, 2000, 20:228-36), Pichia pastoris 
5 yeast cells (Chen, Y„ et al., Proiein Expr Purif, 2000, 20:472-84), transduced human bone 
marrow cells (Takenaka, T., et al., Exp Hematol 1999, 27:1149-59), and continuously 
cultured genetically engineered human fibroblasts (Schiffmann, R., et al., JAMA , 2001, 
285:2743-9). Despite the ability to successfully express and purify human a-GAL since 
1977, the three-dimensional structure has not been solved, although a crystallization report 
10 appeared in 1994 (Murali, R., et ah, J. Mol Biol 239:578-80). Structural analysis has been 
hindered by the heterogeneous carbohydrates on the glycoprotein, which comprise 5-15% of 
the mass of the secreted material and contain over 70 different species built upon 23 different 
core structures (Matsuura, F., et al., Glycobiology 1998, 8:329-39). 

Thus, there is a great need to solve the crystal structure of a-GAL and, in particular, 
15 to delineate the active site of the enzyme. With this information, computer models of this 
active/binding site can be created and potential agonists and antagonists of a-GAL can be 
rationally designed. 

Summary of the Invention 

This invention provides the crystal structure of human a-GAL. The crystal structure 
has been solved by X-ray crystallography to a resolution of 3.25 A. Based upon the crystal 
structure we have characterized human a-GAL in detail and identified the key amino acid 
residues that make up the active/binding site of the enzyme. These coordinates are useful in 
methods for designing agonists and antagonists of the enzyme, which in turn may be useful in 
treating Fabry and other diseases. 

The invention also provides the X-ray structure coordinates of a complex comprising 
a-GAL and its catalytic product, a- galactose. 

In another aspect the invention provides a computer programmed with the coordinates 
of the human a-GAL active/binding site, and with a program capable of converting those 
coordinates into a three-dimensional representation of the active site on a display connected 
to the computer. 

In a further aspect, the invention provides a computer which, when programmed with 
at least a portion of the structural coordinates of human a-GAL and an X-ray diffraction data 
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set of a different molecule or molecular complex, performs a Fourier transform of these 
structural coordinates of the human ct-GAL coordinates and then processes the X-ray 
diffraction data into structure coordinates of the different molecule or molecular complex via 
the process of molecular replacement. 
.5 These and other objects of the invention will be described in further detail in 

connection with the detailed description of the invention. 

Brief Description of the Sequences 
SEQ ID NO:l is the nucleotide sequence of the human ot-GAL cDNA. 
w SEQ ID NO:2 is the predicted amino acid sequence of the translation product of 

human ct-GAL cDNA (SEQ ID NO: 1). 

Brief Description of the Drawings 

FIG. 1 (pp. 1-82) lists the atomic structure coordinates for human ct-GAL as derived 
15 by X-ray diffraction from a crystal of human ot-GAL dimer. The following abbreviations are 
used in FIG. I: "Atom type" refers to the element whose coordinates are measured. The first 
letter in the column defines the element. 

"X, Y, Z" crystallographically define the atomic position of the element measured. 
"OCC" is an occupancy factor that refers to the fraction of the molecules in which 
20 each atom occupies the position specified by the coordinates. A value of "1" indicates that 
each atom has the same conformation, i.e., the same position, in all molecules of the crystal. 
"B" is a thermal factor that measures movement of the atom around its atomic center. 
FIG. 2 shows a diagram of a computer used to generate a three-dimensional graphical 
representation of a molecule or molecular complex according to this invention. 
2.5 FIG. 3 shows a cross section of a magnetic storage medium. 

FIG. 4 shows a cross section of an optically-readable data storage medium. 
FIG. 5 is a schematic showing the reaction catalyzed by ct-GAL; FIG. 5(a) the general 
reaction of ct-GAL; FIG. 5 (b) ot-GAL and Fabry disease. 

FIG. 6 depicts a stereo ribbon diagram of the overall fold of: (a) the ot-GAL 
30 monomer; (b) and (c) the ct-GAL dimer (two views); (d) the surface of Ct-GAL. 

FIG. 7 is a phylogeny tree depicting the evolutionary relationships in the ot-GAL/ct- 
NAGAL family. 
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FIG. 8 depicts electron density maps showing the active site of human ct-GAL from 
(a) native and (b) galactose-soaked crystals; FIG. 8 (c) shows the superimposed active sices of 
human ct-GAL (green), and chicken a-NAGAL (yellow). 

FIG. 9 depicts the N-linked carbohydrate attached to N192 of human a-GAL is 
5 shown with helix a4. Electron density from a aA-weighted simulated annealing composite 
omit map (grey) is contoured at 1.1a. Five sugar residues have been built into the electron 
density at this site. 

FIG. 10 is a schematic representation of the human a-GAL active site with a 
galactose molecule. 

10 

Detailed Description of the Invention 

As mentioned above, we have solved the three-dimensional X-ray crystal structure of 
human a-galactosidase. The atomic coordinate data is presented in FIG. i. 

In order to use the structure coordinates generated for the human a-galactosidase, its 

15 active site or portions or homologues thereof, it is often times necessary to convert them into 
a three-dimensional shape. This is achieved through the use of commercially available 
software that is capable of generating three-dimensional graphical representations of 
molecules or portions thereof from a set of structure coordinates. 

An "active site", also referred to as "binding site" elsewhere herein, is of significant 

20 utility in fields such as drug discovery. The association of natural ligands or substrates with 
the active site(s) (or "binding pocket") of their corresponding receptors or enzymes is the 
basis of many biological mechanisms of action. Similarly, many drugs exert their biological 
effects through association with the binding pockets of receptors and enzymes. Such 
associations may occur with all or any parts of the binding pocket. An understanding of such 

25 associations will help lead to the design of drugs having more favorable associations with 
their target receptor or enzyme, and thus, improved biological effects. Therefore, this 
information is valuable in designing potential agonists and antagonists of the binding sites of 
biologically important targets. 

The term "active site" (or "binding pocket"), as used herein, refers to a specific region 

30 of an enzyme, that, as a result of its shape, favorably associates with its substrate and 
catalysis occurs. 

We have identified at least one active site per monomer in human a-GAL, which is a 
good target for designing agonists and/or antagonists and/or inhibitors. 
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The terms "ct-GAL-like binding pocket", as used herein, refers to a portion of a 
molecule or molecular complex whose shape is sufficiently similar to the human a-GAL 
binding pocket, so as to bind common ligands. These commonalties of shape are defined by a 
root mean square deviation from the structure coordinates of the backbone atoms of the 
5 amino acids that make up these binding pockets in the human a-GAL structure (as set forth 
in FIG. 1) of not more than 1.5 A. The method of performing this calculation is described 
below. 

The x-ray structure reveals human a-GAL as a homodimeric glycoprotein with each 
monomer composed of two domains, a (P/a)g domain containing the active site and a C- 

io terminal domain containing eight antiparallel P strands on two sheets in a P sandwich (FIG. 
6a). After removal of the 31 residue signal sequence, the first domain extends from residues 
32 to 330 and contains the active site formed by the C-terminal ends of the P strands at the 
center of barrel, a typical location for the active site in ((3/a)g domains. The second domain, 
comprised of residues 331 to 429, packs against the first with an extensive interface, burying 

15 2500 A 2 of surface area within one monomer. The dimer has overall protein dimensions of 
approximately 75 x 75 x 50A (FIG. 6b). The molecule is concave in the third dimension and 
varies in thickness from approximately 20 to 50A (FIG. 6c). Electron density is visible for 
390 and 391 amino acid residues (out of 398 total) in the two copies of the monomer in the 
crystallographic asymmetric unit; the missing residues occur at the C-terminus. The two 

20 monomers pack with an interface that extends the 75A width of the dimer and buries 2200 A 2 
of surface area. In the dimer interface, 30 residues from each monomer contribute to the 
interface, from loops pi-al, P6-a6, P7-a7, P8-a8, Pll-pl2, and P15-P16. The dimer is 
markedly negatively charged, as seen in a surface electrostatic potential (FIG. 6d). With 47 
carboxylate groups and only 36 basic residues in the 398 residues in the molecule, the overall 

25 charge per monomer is expected to be -11 at neutral pH. The carboxylates are most 
concentrated around the active site, but in the low pH of the lysosome, many of these groups 
become protonated, reducing the charge on the molecule. In addition to the negative charges 
on the protein, the N-linked carbohydrate is highly phosphorylated and sialylated (Lee, K., et 
al., Glycobiology, 2003, 13:305-13) (see below), further increasing its negative electrostatic 

so potential. The N-linked carbohydrates fall distal to the active sites (FIG. 6d). Each monomer 
contains the three N-linked carbohydrate sites, five disulfide bonds (C52-C94, C56-C63, 
C142-C172, C202-C223, and C378-C382), two unpaired cysteines (C90 and C174), and three 
cis prolines (P210, P380, and P389). 
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As mentioned above, the C-terminal seven and eight residues of each chain have no 
electron density associated with them and are presumably disordered. This disorder is 
consistent with the observation of slight heterogeneity in the C-terminus of recombinant 
human a-GAL, where the truncation of one or two residues from the C-terminus can occur 

5 but has no effect upon the activity of the enzyme (Lee, K., et al., Glycobiology, 2003, 13:305- 
13). The structure offers no support for the observation that the removal of 2 to 10 residues 
from the C-terminus increases the activity of a-GAL (Miyamura, N., et al M J Clin Invest, 
1996, 98:1809-17), because the final residue seen in the structure falls at least 45 A from each 
active site and on the opposite face of the molecule. 

io In both the native and galactose-spaked crystal structures, electron density appears in 

the two crystallographically-independent active sites (FIGS. 8a and b). In the galactose- 
soaked crystal, this density represents ct-galactose, the normal catalytic product of the 
enzyme (Ki -ImM). In the native structure, this density most likely derives from the 
cryoprotectant ethylene glycol, a weak inhibitor of glycoside hydrolases (Tsitsanou, K. E., et 

/5 al., Protein Sci, 1999, 8:741-9), analogous to the insertion of glycerol into carbohydrate 
binding sites on proteins (Garman, S. C, et al., Structure, 2002, 10:425-434; Tsitsanou, K. E., 
et al., Protein Sci, 1999, 8:741-9; Schmidt, A., et al., Protein Sci t 1998, 7:2081-8). The two 
active sites of the dimer are separated by approximately 50 A. As the enzyme shows little 
change between the liganded and unliganded structures, there is no evidence for cooperativity 

20 between the two sites, although the biochemical evidence is mixed (Dean, K. J. & Sweeley, 
C. C, J Biol Chem, 1979, 254:9994-10000; Bishop, D. F. & Desnick, R. J., J Biol Client, 
1981,256:1307-16). 

We have determined that human a-GAL binds a-galactose by making specific 
contacts to each functional group on the monosaccharide. Residues from seven loops in 

25 domain 1 form the active site: pl-al, f}2-a2, P3-a3, P4-a4, P5-a5, P6-a6, and p7-a7. The 
active site is formed by the side chains of residues W47, D92, D93, Y134, C142, K168, 
D170, E203, L206, Y207, R227, D231, D266, and M267. Thus, a binding pocket defined by 
the structural coordinates of these amino acids, as set forth in FIG. 1; or a binding pocket 
whose root mean square deviation from the structure coordinates of the backbone atoms of 

30 these amino acids is not more than 1.5 A is considered a human a-GAL-like binding pocket 
of this invention. In important embodiments, C172 makes a disulfide bond to C142. 

In the a-GAL/a-NAGAL family, specificity for the 2 position on the galactose ligand 
occurs via the P5-ct5 loop. This was called the "N-acetyl recognition loop" in a-NAGAL 
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(Garman, S. C, et al., Structure, 2002, 10:425-434); in the overall a-GAL/a-NAGAL family 
"2 position recognition loop" or u 2 loop" is appropriate. This loop falls near the boundary of 
exons 4 and 5 of animal a-GAL/a-NAGAL, which have a small insertion in this region, 
resulting in a short helical stretch at the top of the p5 strand; this insertion is absent in other 
5 species. Plant and fungal a-GALs use a Cys and a Trp on this loop to coordinate the 2- 
hydroxyl on galactose; animal a-GAL uses a Glu and a Leu to recognize the 2-hydroxyl 
(FIG. 7, green) while animal a-NAGAL uses a Ser and an Ala to recognize an N-acetyl at the 
2 position (FIG. 7, yellow). In the animal enzymes, the larger Glu and Leu side chains 
sterically block the larger N-acetyl substituent, while the smaller Ser and Ala side chains 

10 nicely accommodate an N-acetyl group and tolerate a hydroxy! group. 

With three different conformations in the 2 loop now identified, the substrate 
specificity of the other members of the family can be categorized by homology. For 
example, genome sequencing of Drosophila melanogaster and Anopheles gambiae have each 
identified pairs of genes in the a-GAL family. By examination of the sequences in the 2 loop, 

/ 5 two are clearly a-NAGALs while the other two appear to be a-GALs (FIG. 7, yellow and 
purple). Surprisingly, Aspergillus niger contains an enzyme identified as a-GAL that, 
although only 30% identical to the animal protein sequences, contains a 2 loop virtually 
identical to animal a-NAGALs (FIG. 7, yellow). We predict this enzyme is primarily an a- 
NAGAL with partial a-GAL activity, much like human a-NAGAL, which was originally 

20 thought to be an a-GAL based upon similar activity (Dean, K. J., et al., Biochem. Biophys. 
Res. Commun., 1977, 77:141 1-7; Schram, A. W., et al., Biochim, Biophys. Acta, 1977, 
482:138-44). 

Although human a-GAL makes contacts to each functional group on the a-galactose 
ligand, the enzyme shows little specificity for the distal portion of the substrate beyond the 

25 glycosidic linkage, and the active site cleft is found in a broad opening on the concave 
surface of the enzyme (FIG. 6c). The lack of substrate specificity of human a-GAL beyond 
the terminal a-galactose differs slightly from the specificity of other a-GALs, which act only 
upon substrates containing terminal al-6 galactose groups (Kim,W.D., et al., Phytochemistry, 
2002, 61:621-30). This increased specificity of plant a-GALs may derive from their 

30 monomeric structure, as residues buried in the dimer interface of animal a-GALs (e.g., those 
on the Pl-al loop - Fujimoto, Z., et al., J Biol Chem, 2003, 278:20313-8) are available for 
ligand recognition in monomeric a-GALs. 



-9- 

Both a-GALs and a-NAGALs are a retaining exoglycosidases, where both the 
substrate and product of the catalytic reactions are a anomers at the 1 position on the 
galactose ring. This retention of anomeric configuration is accomplished by a double 
displacement catalytic mechanism where the anomeric carbon undergoes two successive 

5 nucleophilic attacks (Vasella, A., et al., Curr Opin Chem Biol 2002, 6:619-29). The two 
sequential inversions of the anomeric carbon lead to retention of the configuration at the end 
of the catalytic cycle. In two a-GALs from different species, peptic digestion of covalently 
trapped intermediates has identified the specific aspartic acid acting as the catalytic 
nucleophile (Hart, D. O., et al., Biochemistry, 2000, 39:9826-36; Ly, H. D., et al., Carbohydr. 

io Res., 2000, 329:539-47). These data, combined with the high resolution structure of chicken 
a-NAGAL, predict the catalytic mechanism of human a-GAL. In human a-GAL, the first 
nucleophilic attack upon the substrate comes from D170, cleaving the glycosidic linkage and 
leading to a covalent enzyme-intermediate complex. In the second step of the reaction, a 
water molecule (deprotonated by D231) attacks CI of the covalent intermediate, liberating 

/.5 the second half of the catalytic product and regenerating the enzyme in its initial state. 
Human a-GAL operates most efficiently at low pH, consistent with its highly acidic 
composition and its lysosomal location. 

Retaining glycosidases typically have distances of 5-6A between catalytic 
carboxylates, while inverting glycosidases typically have distances of 9-1 lA between these 

20 residues (McCarter, J. D. & Withers, S. G„ Curr. Opin. Struct. Biol. 1994, 4:885-92). From 
these distances, it has been possible to reliably predict the mechanism and function of a 
glycosidase given its structure. However, this rule must be reconsidered in light of the new 
structures in the a-GAL/a-NAGAL family: for the known structures in the family, the closest 
approach of the two catalytic carboxylates is 6.5-7A, among the largest distances seen for 

25 retaining glycosidases. 

It will be readily apparent to those of skill in the art that the numbering of amino acids 
in other isoforms of human a-GAL may be different than that set forth for herein. 
Corresponding amino acids in other isoforms of human a-GAL are easily identified by visual 
inspection of the amino acid sequences or by using commercially available homology 

30 software programs. Each of those amino acids of human a-GAL is defined by a set of 
structure coordinates set forth in FIG. 1. The term "structure coordinates" refers to Cartesian 
coordinates derived from mathematical equations related to the patterns obtained on 
diffraction of a monochromatic beam of X-rays by the atoms (scattering centers) of a protein 
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or protein-ligand complex in crystal form. The diffraction data are used to calculate an 
electron density map of the repeating unit of the crystal. The electron density maps are then 
used to establish the positions of the individual atoms of the enzyme or enzyme complex. 

Those of skill in the art understand that a set of structure coordinates for an enzyme or 

5 an enzyme-complex or a portion thereof, is a relative set of points that define a shape in three 
dimensions. Thus, it is possible that an entirely different set of coordinates could define a 
similar or identical shape. Moreover, slight variations in the individual coordinates will have 
little effect on overall shape. In terms of binding pockets, these variations would not be 
expected to significantly alter the nature of ligands that could associate with those pockets. 

10 The term "associating with" refers to a condition of proximity between a chemical 

entity or compound, or portions thereof, and a binding pocket or binding site on a protein. 
The association may be non-covalent-wherein the juxtaposition is energetically favored by 
hydrogen bonding or van der Waals or electrostatic interactions-or it may be covalent. 

The variations in coordinates discussed above may be generated because of 

is mathematical manipulations of the human a-GAL structure coordinates. For example, the 
structure coordinates set forth in FIG. 1 could be manipulated by crystal lographic 
permutations of the structure coordinates, fractionalization of the structure coordinates, 
integer additions or subtractions to sets of the structure coordinates, inversion of the structure 
coordinates or any combination of the above. 

20 Alternatively, modifications in the crystal structure due to mutations, additions, 

substitutions, and/or deletions of amino acids, or other changes in any of the components that 
make up the crystal could also account for variations in structure coordinates. If such 
variations are within an acceptable standard error as compared to the original coordinates, the 
resulting three-dimensional shape is considered to be the same. Thus, for example, a ligand 

25 (e.g., substrate) that bound to the a-GAL active site would also be expected to bind to 
another binding pocket whose structure coordinates defined a shape that fell within the 
acceptable error. 

Various computational analyses are therefore necessary to determine whether a 
molecule or the binding pocket portion thereof is sufficiently similar to the a-GAL 
30 active/binding site described above. Such analyses may be carried out in well known software 
applications, such as the Molecular Similarity application of Quanta™ (Molecular 
Simulations Inc., San Diego, CA.) version 4.1, and as described in the accompanying User's 
Guide. 
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The Molecular Similarity application permits comparisons between different 
structures, different conformations of the same structure, and different parts of the same 
structure. The procedure used in Molecular Similarity to compare structures is divided into 
four steps: 1) load the structures to be compared; 2) define the atom equivalences in these 
5 structures; 3) perform a fitting operation; and 4) analyze the results. 

Each structure is identified by a name. One structure is identified as the target (i.e., 
the fixed structure); all remaining structures are working structures (i.e., moving structures). 
Since atom equivalency within Quanta™ is defined by user input, for the purpose of this 
invention we will define equivalent atoms as protein backbone atoms (N, Cot, C and O) for 
io all conserved residues between the two structures being compared. We also consider only 
rigid fitting operations. 

When a rigid fitting method is used, the working structure is translated and rotated to 
obtain an optimum fit with the target structure. The fitting operation uses an algorithm that 
computes the optimum translation and rotation to be applied to the moving structure, such 
tt that the root mean square difference of the fit over the specified pairs of equivalent atom is an 
absolute minimum. This number, given in angstroms (A), is reported by Quanta™. 

For the purpose of this invention, any molecule or molecular complex or binding 
pocket thereof that has a root mean square deviation of conserved residue backbone atoms 
(N, Cot, C and 0) of less than 1.5 A when superimposed on the relevant backbone atoms 
20 described by structure coordinates listed in FIG. 1 are considered identical. More preferably, 
the root mean square deviation is less than 1.0 A. 

The term "root mean square deviation" means the square root of the arithmetic mean 
of the squares of the deviations from the mean. It is a way to express the deviation or 
variation from a trend or object. For purposes of this invention, the "root mean square 
25 deviation" defines the variation in the backbone of a protein from the backbone of human ct- 
GAL or a binding pocket portion thereof, as defined by the structure coordinates of human cc- 
GAL described herein. 

Therefore, according to one aspect of the invention a computer is provided for 
producing: 

30 (a) a three-dimensional representation of a molecule or molecular complex, wherein 

said molecule or molecular complex comprises a binding pocket defined by structure 
coordinates of human a-galactosidase amino acids W47, D92, D93, Y134, C142, K168, 
D170, E203, L206, Y207, R227, D231, D266, and M267, according to FIG. 1; or 
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b) a ihree-dimensional representation of a homologue of said molecule or molecular 
complex, wherein said homologue comprises a binding pocket that has a root mean square 
deviation from the backbone atoms of said amino acids of not more than 1.5 A, wherein said 
computer comprises: 

(i) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises the 
structure coordinates of human a-galactosidase amino acids W47, D92, D93, 
YI34, C142, K168, D170, E203, L206, Y207, R227, D231, D266, and M267, 
according to FIG. 1; 

(ii) a working memory for storing instructions for processing said machine- 
readable data; 

(iii) a central-processing unit coupled to said working memory and to said 
machine-readable data storage medium for processing said machine readable 
data into said three-dimensional representation; and 

(iv) a display coupled to said central-processing unit for displaying said three- 
dimensional representation. 

In an important embodiment, C172 makes a disulfide bond to C142. 

According to another aspect of the invention, a computer for producing a three- 
dimensional representation of a molecule or molecular complex defined by structure 
coordinates of all of the human a-GAL amino acids set forth in FIG. 1, or a three- 
dimensional representation of a homologue of said molecule or molecular complex, is 
provided. The homologue comprises a binding pocket that has a root mean square deviation 
from the backbone atoms of said amino acids of not more than 1.5 A. In this aspect of the 
invention, a machine readable data contains the coordinates of all of human a-GAL. 

According to a further aspect, the invention provides a computer for determining at 
least a portion of the structure coordinates corresponding to X-ray diffraction data obtained 
from a molecule or molecular complex, wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises at least a portion of the 
structural coordinates of human a-GAL according to FIG. 1; 
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(b) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises X-ray diffraction data from 
said molecule or molecular complex; 

(c) a working memory for storing instructions for processing said machine-readable 
.5 data of (a) and (b); 

(d) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium of (a) and (b) for performing a Fourier transform of the 
machine readable data of (a) and for processing said machine readable data of (b) into 
structure coordinates; and 

w (e) a display coupled to said central-processing unit for displaying said structure 

coordinates of said molecule or molecular complex. 

FIG. 2 demonstrates one version of the foregoing aspects. System 10 includes a 
computer 1 1 comprising a central processing unit ("CPU") 20, a working memory 22 which 
may be, e.g., RAM (random-access memory) or "core" memory, mass storage memory 24 

/5 (such as one or more disk drives or CD-ROM drives), one or more cathode-ray tube ("CRT") 
display terminals 26, one or more keyboards 28, one or more input lines 30, and one or more 
output lines 40, all of which are interconnected by a conventional bi-directional system bus 
50. 

Input hardware 36, coupled to computer 1 1 by input lines 30, may be implemented in 
20 a variety of ways. Machine-readable data of this invention may be inputted via the use of a 
modem or modems 32 connected by a telephone line or dedicated data line 34. Alternatively 
or additionally, the input hardware 36 may comprise CD-ROM drives or disk drives 24. In 
conjunction with display terminal 26, keyboard 28 may also be used as an input device. 

Output hardware 46, coupled to computer 1 \ by output lines 40, may similarly be 
25 implemented by conventional devices. By way of example, output hardware 46 may include 
CRT display terminal 26 for displaying a graphical representation of a binding pocket of this 
invention using a program such as Quanta™ as described herein. Output hardware might also 
include a printer 42, so that hard copy output may be produced, or a disk drive 24, to store 
system output for later use. 
jo In operation, CPU 20 coordinates the use of the various input and output devices 36, 

46, coordinates data accesses from mass storage 24 and accesses to and from working 
memory 22, and determines the sequence of data processing steps. A number of programs 
may be used to process the machine-readable data of this invention. Such programs are 
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discussed in reference to the computational methods of drug discovery as described herein. 
Specific references to components of the hardware system 10 are included as appropriate 
throughout the following description of the data storage medium. 

FIG. 3 shows a cross section of a magnetic data storage medium 100 which can be 
5 encoded with a machine-readable data that can be carried out by a system such as system 10 
of FIG. 2. Medium 100 can be a conventional floppy diskette or hard disk, having a suitable 
substrate 101, which may be conventional, and a suitable coating 102, which may be 
conventional, on one or both sides, containing magnetic domains (not visible) whose polarity 
or orientation can be altered magnetically. Medium 100 may also have an opening (not 

10 shown) for receiving the spindle of a disk drive or other data storage device 24. 

The magnetic domains of coating 102 of medium 100 are polarized or oriented so as 
to encode in manner which may be conventional, machine readable data such as that 
described herein, for execution by a system such as system 10 of FIG. 2. 

FIG. 4 shows a cross section of an optical ly-readable data storage medium 110 which 

/.5 also can be encoded with such a machine-readable data, or set of instructions, which can be 
carried out by a system such as system 10 of FIG. 2. Medium 1 10 can be a conventional 
compact disk read only memory (CD-ROM) or a rewritable medium such as a magneto- 
optical disk which is optically readable and magneto-optically writable. Medium 100 
preferably has a suitable substrate 111, which may be conventional, and a suitable coating 

20 1 12, which may be conventional, usually of one side of substrate 1 1 1. 

In the case of CD-ROM, as is well known, coating 1 12 is reflective and is impressed 
with a plurality of pits 1 13 to encode the machine-readable data. The arrangement of pits is 
read by reflecting laser light off the surface of coating 112. A protective coating 114, which 
preferably is substantially transparent, is provided on top of coating 112. 

25 In the case of a magneto-optical disk, as is well known, coating 112 has no pits 113, 

but has a plurality of magnetic domains whose polarity or orientation can be changed 
magnetically when heated above a certain temperature, as by a laser (not shown). The 
orientation of the domains can be read by measuring the polarization of laser light reflected 
from coating 112. The arrangement of the domains encodes the data as described above. 

30 Thus, in accordance with the present invention, X-ray coordinate data capable of 

being processed into a three dimensional graphical display of a molecule or molecular 
complex which comprises an a-GAL-like binding pocket is stored in a machine-readable 
storage medium. 
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The human ct-GAL X-ray coordinate data, when used in conjunction with a computer 
programmed with software to translate those coordinates into the 3-dimensional structure of a 
molecule or molecular complex comprising an a-GAL-like binding pocket may be used for a 
variety of purposes, such as drug discovery. 
5 For example, the structure encoded by the data may be computationally evaluated for 

its ability to associate with chemical entities. Chemical entities that associate with human a- 
GAL may inhibit that enzyme, and are potential drug candidates. Alternatively, the structure 
encoded by the data may be displayed in a graphical three-dimensional representation on a 
computer screen. This allows visual inspection of the structure, as well as visual inspection of 
w the structure's association with chemical entities. 

Thus, according to another aspect the invention relates to a method for evaluating the 
potential of a chemical entity to associate with: 

a) a molecule or molecular complex comprising a binding pocket defined by structure 
coordinates of human a-galactosidase amino acids W47, D92, D93, Y134, C142, K168, 

is D170, E203, L206, Y207, R227, D231, D266, and M267, according to FIG. 1, or 

b) a homologue of said molecule or molecular complex, wherein said homologue 
comprises a binding pocket that has a root mean square deviation from the backbone atoms of 
said amino acids of not more than 1 .5 A. The method comprises the steps of: 

i) employing computational means to perform a fitting operation between the 
20 chemical entity and a binding pocket of the molecule or molecular complex; and 

ii) analyzing the results of said fitting operation to quantify the association between 
the chemical entity and the binding pocket. 

The term "chemical entity," as used herein, refers to chemical compounds, complexes 
of at least two chemical compounds, and fragments of such compounds or complexes. 

25 Alternatively, the structural coordinates of the human a-GAL binding pocket can be 

utilized in a method for identifying a potential agonist or antagonist of a molecule comprising 
a human a-GAL-like binding pocket. The method comprises the steps of: 

a) using the atomic coordinates of human a-galactosidase amino acids W47, D92, 
D93, Y134, C142, K168, D170, E203, L206, Y207, R227, D231, D266, and M267, 

30 according to FIG. 1 ± a root mean square deviation from the backbone atoms of said amino 
acids of not more than 1.5 A, to generate a three-dimensional structure of molecule 
comprising a-GAL-like binding pocket; 
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b) employing said three-dimensional structure to design or select said potential 
agonist or antagonist; 

c) synthesizing said agonist or antagonist; and 

d) contacting said agonist or antagonist with said molecule to determine the ability of 
5 said potential agonist or antagonist to interact with said molecule. 

In important embodiments, the atomic coordinates of all the amino acids of NS3 
human a-GAL according to FIG. I ± a root mean square deviation from the backbone atoms 
of said amino acids of not more than 1.5 A, are used to generate a three-dimensional structure 
of molecule comprising an a-GAL-like binding pocket. 

io For the first time, the present invention permits the use of molecular design 

techniques to identify, select and design chemical entities, including agonists and antagonists, 
capable of binding to human a-GAL-like binding pockets. Because of the present invention, 
the necessary information for designing new chemical entities and compounds that may 
interact with human a-GAL-like binding pockets, in whole or in part, is provided. 

15 Throughout this section, discussions about the ability of an entity to bind to, associate 

with or inhibit a human a-GAL-like binding pocket refers to features of the entity alone. 
Assays to determine if a compound binds to human a-GAL are well known in the art and are 
exemplified below. 

The design of compounds that bind to or inhibit human a-GAL-like binding pockets 
20 according to this invention generally involves consideration of two factors. First, the entity 
must be capable of physically and structurally associating with parts or all of the human a- 
GAL -like binding pockets. Non-covalent molecular interactions important in this 
association include hydrogen bonding, van der Waals interactions, hydrophobic interactions 
and electrostatic interactions. 
25 Second, the entity must be able to assume a conformation that allows it to associate 

with the human a-GAL-like binding pocket directly. Although certain portions of the entity 
will not directly participate in these associations, those portions of the entity may still 
influence the overall conformation of the molecule. This, in turn, may have a significant 
impact on potency. Such conformational requirements include the overall three-dimensional 
30 structure and orientation of the chemical entity in relation to all or a portion of the binding 
pocket, or the spacing between functional groups of an entity comprising several chemical 
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entities that directly interact with the human a-GAL-like binding pocket or homologies 
thereof. 

The potential inhibitory or binding effect of a chemical entity on a human a-GAL-like 
binding pocket may be analyzed prior to its actual synthesis and testing by the use of 

5 computer modeling techniques. If the theoretical structure of the given entity suggests 
insufficient interaction and association between it and the human a-GAL-like binding pocket, 
testing of the entity is obviated. However, if computer modeling indicates a strong 
interaction, the molecule may then be synthesized and tested for its ability to bind to a human 
a-GAL-like binding pocket. This may be achieved by testing the ability of the molecule to 

10 inhibit human a-GAL using assays described in the art. In this manner, synthesis of 
inoperative compounds may be avoided. 

A potential inhibitor of a human a-GAL-like binding pocket may be computationally 
evaluated by means of a series of steps in which chemical entities or fragments are screened 
and selected for their ability to associate with the human a-GAL-like binding pockets. 

is One skilled in the art may use one of several methods to screen chemical entities or 

fragments for their ability to associate with a human a-GAL-like binding pocket. This 
process may begin by visual inspection of, for example, a human a-GAL-like binding pocket 
on the computer screen based on the human a-GAL structure coordinates in FIG. 1 or other 
coordinates which define a similar shape generated from the machine-readable storage 

20 medium. Selected fragments or chemical entities may then be positioned in a variety of 
orientations, or docked, within that binding pocket as defined supra. Docking may be 
accomplished using software such as Quanta™ and Sybyl™, followed by energy 
minimization and molecular dynamics with standard molecular mechanics force fields, such 
as Charmm™ and Amber™. 

25 Specialized computer programs may also assist in the process of selecting fragments 

or chemical entities. These include: GRID (P. J. Goodford, J. Med. Chem., 1985, 28:849- 
857), available from Oxford University, Oxford, UK; MCSS (A. Miranker et al., Proteins: 
Structure, Function and Genetics, 1991, 11:29-34), available from Molecular Simulations, 
San Diego, CA; AUTODOCK (D. S. Goodsell et al., Proteins: Structure. Function, and 

30 Genetics, 1990, 8:195-20), available from Scripps Research Institute, La Jolla, CA; DOCK 
(I. D. Kuntz et al., J. Mol Biol., 1982, 161:269-288), available from University of California, 
San Francisco, CA. 
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Other suitable software that can be used to view, analyze, design, and/or model a 
protein, and/or protein fragments, include but are not limited to: Alchemy™, LabVision™, 
Sybyl™, Molcadd™, Leapfrog™, Matchmaker™, Genefold™ and Sitel™ (available from 
Tripos Inc., St. Louis, MO); Quanta™, Cerius2™, X-Plor™, CNS™, Catalyst™, 

a Modeller™, ChemX™, Ludi™, Insight™, Discover™, Cameleon™ and Iditis™ (available 
from Accelrys Inc., Princeton N.J.); Rasmol™ (available from Glaxo Research and 
Development, Greenford, Middlesex, U.K.); MOE™ (available from Chemical Computing 
Group, Montreal, Quebec, Canada); Maestro™ (available from Shrodinger Inc.,); 
Midas/MidasPlus™ (available from UCSF, San Francisco, CA); VRML (webviewer- 

10 freeware on the internet); Chime (MDL-freeware on the internet); MOIL (available from 
University of Illinois, Urbana-Champaign, IL); MacroModel™ and GRASP™ (available 
from Columbia University, New York, NY); Ribbon™ (available from University of 
Alabama, Tuscaloosa, AL); NAOMI™ (available from Oxford University, Oxford, UK); 
Explorer Eyechem™ (available from Silicon Graphics Inc., Mountain View, CA); 

15 Univision™ (available from Cray Research Inc., Seattle, WA); Molscript™ and 0 (available 
from Uppsala University, Uppsala, Sweden); Chem 3D™ and Protein Expert™ (available 
from Cambridge Scientific, MA); Chain™ (available from Baylor College of Medicine, 
Houston, TX); Spartan™, MacSpartan™ and Titan™ (available from Wavefunction Inc., 
Irvine, CA); VMD™ (available from U. Illinois/Beckman Institute); Sculpt™ (available from 

20 Interactive Simulations, Inc., Portland, OR); Procheck™ (available from Brookhaven 
National Laboratory, Upton, NY); DGEOM (available from QCPE-Quantum Chemistry 
Program Exchange, Indiana University Bloomington, IN); RE_VDEW (available from Brunei 
University, London, UK); Xmol (available from Minnesota Supercomputing Center, 
University of Minnesota, Minneapolis, MN); Hyperchem™ (available from Hypercube, Inc., 

2S Gainesville, FL); MD Display (available from University of Washington, Seattle, WA.); PKB 
(available from National Center for Biotechnology Information, NIH, Bethesda, MD); 
Molecular Discovery Programmes (available from Molecular Discovery Limited, Mayfair, 
London); Growmol™ (available from Thistlesoft, Morris Township, N.J.); MICE (available 
from The San Diego Supercomputer Center. La Jolla, CA); Yummie and MCPro (available 

30 from Yale University, New Haven, CT); Caveat™ (P. A. Bartlett et al, In "Molecular 
Recognition in Chemical and Biological Problems", Special Pub., Royal Chem. Soc., 1989, 
78:182-196; G. Lauri and P. A. Bartlett, J. Compia. Aided MoL Des., 1994, 8:51-66), 
available from the University of California, Berkeley, CA; 3D Database systems such as 
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ISIS™ (MDL Information Systems, San Leandro, CA). This area is reviewed in Y. G 
Martin, "3D Database Searching in Drug Design", 7. Mecl Chem., 1992, 35:2145-2154; 
Hook™ (M. B. Eisen et al, Proteins: Struct., Funct.. Genet., 1994, 19:199-221), available 
from Molecular Simulations, San Diego, CA; and upgraded versions thereof. 

5 Once suitable chemical entities or fragments have been selected, they can be 

assembled into a single compound or complex. Assembly may be preceded by visual 
inspection of the relationship of the fragments to each other on the three-dimensional image 
displayed on a computer screen in relation to the structure coordinates of human a-GAL. 
This would be followed by manual model building using software such as Quanta™ or 

to Sybyl™. 

Instead of proceeding to build an inhibitor of human a-GAL-like binding pocket in a 
step-wise fashion one fragment or chemical entity at a time as described above, inhibitory or 
other human a-GAL binding compounds may be designed as a whole or "de novo" using 
either an empty binding site or optionally including some portion(s) of a known inhibitor(s). 

/5 Other molecular modeling techniques may also be employed in accordance with this 

invention [see, e.g., N. C. Cohen et al., / Med. Chem., 1990, 33:883-894; see also, M. A. 
Navia and M. A. Murcko, Curr. Opin, in Struct. Biology, 1992, 2:202-210; L. M. Balbes et 
al., "A Perspective of Modern Methods in Computer-Aided Drug Design", in Reviews in 
Computational Chemistry, Vol. 5, K. B. Lipkowitz and D. B. Boyd, Eds., VCH, New York, 

20 pp. 337-380 (1994); see also, W. C. Guida, Curr. Opin. Struct. Biology, 1994, 4:777-781]. 

Once a compound has been designed or selected by the above methods, the efficiency 
with which that entity may bind to an human a-GAL binding pocket may be tested and 
optimized by computational evaluation. For example, an effective human a-GAL binding 
pocket inhibitor must preferably demonstrate a relatively small difference in energy between 

25 its bound and free states (i.e., a small deformation energy of binding). Thus, the most 
efficient human a-GAL binding pocket inhibitors should preferably be designed with a 
deformation energy of binding of not greater than about 10 kcal/mole, more preferably, not 
greater than 7 kcal/mole. Human a-GAL binding pocket inhibitors may interact with the 
binding pocket in more than one conformation that is similar in overall binding energy. In 

30 those cases, the deformation energy of binding is taken to be the difference between the 
energy of the free entity and the average energy of the conformations observed when the 
inhibitor binds to the protein. 
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An entity designed or selected as binding to a human a-GAL binding pocket may be 
further computationally optimized so that in its bound state it would preferably lack repulsive 
electrostatic interaction with the target enzyme and with the surrounding water molecules. 
Such non-complementary electrostatic interactions include repulsive charge-charge, dipole- 
5 dipole and charge-dipole interactions. 

Specific computer software is available in the art to evaluate compound deformation 
energy and electrostatic interactions. Examples of software designed for such uses include: 
Gaussian 94, revision C (M. J. Frisch, Gaussian, Inc., Pittsburgh, PA, ©1995); AMBER, 
version 4.1 (P. A. Kollman, University of California at San Francisco, ©1995); 
10 QUANTA/CHARMM (Molecular Simulations, Inc., San Diego, CA, ©1995); Insight 
II/Discover (Molecular Simulations, Inc., San Diego, CA ©1995); DelPhi (Molecular 
Simulations, Inc., San Diego, CA ©1995); and AMSOL (Quantum Chemistry Program 
Exchange, Indiana University). These programs may be implemented, for instance, using a 
Silicon Graphics workstation such as an Indigo 2 with "IMPACT" graphics. Other hardware 
/.5 systems and software packages will be known to those skilled in the art. 

Another approach enabled by this invention, is the computational screening of small 
molecule databases for chemical entities or compounds that can bind in whole, or in part, to a 
human a-GAL binding pocket. In this screening, the quality of fit of such entities to the 
binding site may be judged either by shape complementarity or by estimated interaction 
20 energy (E. C. Meng et al., / Comp. Client 1992, 13:505-524). 

According to another embodiment, the invention provides compounds which associate 
with a human oc-GAL-like binding pocket produced or identified by the method set forth 
above. 

The structure coordinates set forth in FIG. 1 can also be used to aid in obtaining 
25 structural information about another crystallized molecule or molecular complex. This may 
be achieved by any of a number of well-known techniques, including molecular replacement. 

Therefore, in another aspect this invention provides a method of utilizing molecular 
replacement to obtain structural information about a molecule or molecular complex whose 
structure is unknown comprising the steps of: 
30 a) crystallizing said molecule or molecular complex of unknown structure; 

b) generating X-ray diffraction data from said crystallized molecule or molecular 
complex; and 
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c) applying at least a portion of the structure coordinates set forth in FIG. I to the X- 
ray diffraction data to generate a three-dimensional electron density map of the molecule or 
molecular complex whose structure is unknown. 

By using molecular replacement, all or part of the structure coordinates of the human 
5 ot-GAL as provided by this invention (and set forth in FIG. I) can be used to determine the 
structure of a crystallized molecule or molecular complex whose structure is unknown more 
quickly and efficiently than attempting to determine such information ab initio. 

Molecular replacement provides an accurate estimation of the phases for an unknown 
structure. Phases are a factor in equations used to solve crystal structures that can not be 

iO determined directly, obtaining accurate values for the phases, by methods other than 
molecular replacement, is a time-consuming process that involves iterative cycles of 
approximations and refinements and greatly hinders the solution of crystal structures. 
However, when the crystal structure of a protein containing at least a homologous portion has 
been solved, the phases from the known structure provide a satisfactory estimate of the 

15 phases for the unknown structure. 

Thus, this method involves generating a preliminary model of a molecule or 
molecular complex whose structure coordinates are unknown, by orienting and positioning 
the relevant portion of the human a-GAL according to FIG. 1 within the unit cell of the 
crystal of the unknown molecule or molecular complex so as best to account for the observed 

20 X-ray diffraction data of the crystal of the molecule or molecular complex whose structure is 
unknown. Phases can then be calculated from this model and combined with the observed X- 
ray diffraction data amplitudes to generate an electron density map of the structure whose 
coordinates are unknown. This, in turn, can be subjected to any well-known model building 
and structure refinement techniques to provide a final, accurate structure of the unknown 

25 crystallized molecule or molecular complex [E. Lattman, Meth. EnzymoL, 1985, 115:55-77; 
M. G. Rossmann, ed., "The Molecular Replacement Method", Int. Sci Rev. Ser. y No. 13, 
Gordon & Breach, New York (1972)]. 

The structure of any portion of any crystallized molecule or molecular complex that is 
sufficiently homologous to any portion of human a-GAL can be resolved by this method. 

30 In a preferred embodiment, the method of molecular replacement is utilized to obtain 

structural information about another galactosidase. The structure coordinates of human a- 
GAL as provided by this invention are particularly useful in solving the structure of other 
isoforms of a-GAL or other a-GAL-containing complexes. 
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Furthermore, the structure coordinates of human a-GAL as provided by this invention 
are useful in solving the structure of cc-GAL proteins that have amino acid substitutions, 
additions and/or deletions (referred to collectively as "human a-GAL mutants", as compared 
to naturally occurring human a-GAL isoforms. These human a-GAL mutants may optionally 

5 be crystallized in co-complex with a chemical entity, such as galactose. The crystal structures 
of a series of such complexes may then be solved by molecular replacement and compared 
with that of wild-type human a-GAL. Potential sites for modification within the various 
binding sites of the enzyme may thus be identified. This information provides an additional 
tool for determining the most efficient binding interactions, for example, increased 

io hydrophobic interactions, between human a-GAL and a chemical entity or compound. 

The structure coordinates are also particularly useful to solve the structure of crystals 
of human a-GAL or human a-GAL homologues co-complexed with a variety of chemical 
entities. This approach enables the determination of the optimal sites for interaction between 
chemical entities, including between candidate human a-GAL agonists and human a-GAL. 

/.5 For example, high resolution X-ray diffraction data collected from crystals exposed to 
different types of solvent allows the determination of where each type of solvent molecule 
resides. Small molecules that bind tightly to those sites can then be designed and synthesized 
and tested for their human a-GAL agonistic activity. 

All of the complexes referred to above may be studied using well-known X-ray 

20 diffraction techniques and may be refined versus 1.5-3.5 A resolution X-ray data to an R 
value of about 0.20 or less using computer software, such as X-PLOR [Yale University, 
©1992, distributed by Molecular Simulations, Inc.; see, e.g., Blundell & Johnson, supra; 
Meth. EnzymoL, vol. 114 & 115; H. W. Wyckoff et al., eds., Academic Press (1985)]. This 
information may thus be used to optimize known human a-GAL 

2.5 agonists/antagonists/inhibitors, and more importantly, to design new human a-GAL 
agonists/antagonists/inhibitors. 

The invention will be more fully understood by reference to the following examples. 
These examples, however, are merely intended to illustrate the embodiments of the invention 
and are not to be construed to limit the scope of the invention. 

30 
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Examples 



5 Cloning and Expression of human a-Galactosidase: 

Human a-Galactosidase (Replagal™ lot G302-010, Transkaryotic Therapies, Inc.) 
was produced using gene activation technology as described in detail in U.S. Patents Nos. 
5,733,761, 6,270,989, and 6,565,844, all of which are expressly incorporated herein by 
reference. Briefly, regulatory (e.g., a viral promoter) and structural DNA sequences were 

10 inserted upstream of the endogenous human a-Galactosidase genomic locus (GenBank Acc. 
No. HSU78027) in a human cell (e.g., HT-1080) using homologous recombination. As a 
result, a-Galactosidase expression was enhanced resulting in secretion of a-Galactosidase 
protein to the culture supernatant. The a-Galactosidase polypeptide was then highly purified 
using the methods described in detail in U.S. Patents Nos. 6,083,725, 6,395,884 and 

15 6,458,574, all of which are expressly incorporated herein by reference. 

Crystallization and x-ray data collection: 

Human a-Galactosidase was concentrated to 40mg/ml in 20mM TrisHCI pH 7.5 prior 
to crystallization trials. Crystals were grown in either hanging or sitting drops via vapor 

20 diffusion against a reservoir solution of 30% polyethylene glycol (PEG) 4000 (Fluka), 
lOOmM TrisHCI pH 8.0, and 200mM ammonium sulfate. Crystals were then harvested into 
35% PEG 4000, lOOmM TrisHCI pH 7.5, and 20% (v:v) ethylene glycol. Crystals were 
cooled in liquid nitrogen and then transferred into a gaseous nitrogen stream at 100K for x- 
ray data collection. Ligand-soaked crystals were transferred into 31% PEG 3350, lOOmM 

25 sodium acetate pH 5.5, and 1 lOmM D-(+)-galactose (Sigma) prior to nitrogen cooling and x- 
ray data collection. Despite efforts to increase their size, the crystals never grew larger than 
30 x 30 x 100 /im. For each crystal, 180° of diffraction data were collected at beamline 22- 
ED at the Advanced Photon Source. Processing of x-ray images using the HKL2000 package 
(Otwinowski, Z. & Minor, W., Methods in Enzymology, 1997, 276:307-326) revealed unit 

30 cell constants of approximately 89A x 89A x 215A in space P3 ( 21 or P3 2 21. The diffraction 
from these crystals proved to be extremely anisotropic, with reflections visible to 2.8A in the 
direction of the crystallographic c axis, but only to approximately 4A in the perpendicular 
directions. This, plus the high redundancy and weak diffraction overall from the small 
crystals, resulted in very poor merging statistics. The native frames were initially processed 
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using HKL2000 to 3.25A. Reprocessing the frames in MOSFLM and SCALA (Collaborative 
Computational Project, Acta Crystallogr., 1994, D50:760-763) with anisotropic diffraction 
limits produced maps of lower quality, so this route was abandoned, and the original data 
were used throughout the refinement. The high resolution limits were determined from the 
5 shell where <I/0|> dropped to 2. Intensities were adjusted with TRUNCATE (Collaborative 
Computational Project, Acta Crystallogr., supra) prior to molecular replacement and 
refinement. 

Phasing, model building, and refinement: 

W Molecular replacement calculations were performed in the program AmoRe 

(Collaborative Computational Project, Acta Crystallogr., 1994, D50:760-763) using a 
homology model of the human ct-GAL protein built from the crystal structure of chicken a- 
NAGAL (Garman, S. C, et al., Structure, 2002, 10:425-434). The dimeric model was rotated 
and translated against the 8-4A diffraction amplitudes. Molecular replacement in both 

15 enantiomorphic space group possibilities identified a dimer of a-GAL in the asymmetric unit 
of space group W221 as the top solution, with a correlation coefficient of 28 and an Rf aclor of 
58. Inspection of the packing showed no steric clashes in a unit cell with 50% solvent 
content. Rigid body refinement in the programs AMoRe and CNS (Briinger, A. T., et al., Acta 
Crystallogr. D Biol. Crystallogr. 1998, 54:905-21) was followed by model building in the 

20 program O (Jones, T. A., et al., Acta Crystallogr, 1991, A47:l 10-9). Residue numbering of 
the a-GAL protein begins at the secretory signal; the mature protein begins at amino acid 32. 
Refinement protocols in CNS included conjugate gradient minimization, simulated annealing, 
and temperature factor refinement. Models were built into o A weighted simulated annealing 
composite omit maps calculated in CNS. Strong two-fold non-crystallographic symmetry 

25 restraints (300 kcal/mol-A 2 ) were imposed on all atoms in the early stages of refinement, and 
later relaxed for the atoms that differ between the two halves of the dimer, including those in 
crystal contacts and N-linked carbohydrate atoms. Refinement steps were accepted only if 
they reduced the Rfree (of a test set comprised of 820 reflections, 5% of the total, selected 
using resolution shells). The R wor k and Rf rec for the native structure are 26.2% and 30.1%, 

30 respectively, using all reflections. Because of the limited resolution, side chain rotamers were 
typically chosen during manual rebuilding to be consistent with the 1.9A chicken a-NAGAL 
structure. 
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Sequence alignments, calculations, and figures: 

Beginning with the human ot-GAL sequence, a BLAST search (Altschul, S. F., et al., 
Nucleic Acids Res 1997, 25:3389-402) of the NCBI non-redundant protein sequence database 
found the 50 closest sequences. After removal of 10 highly redundant sequences, the 

5 remaining 40 sequences were multiply aligned in CLUSTALW (Thompson, J. D., et al., 
Nucleic Acids Res, 1994, 22:4673-80), then converted into a phylogeny tree using the 
programs WE1GHBOR (Bruno, W. J., et ah, Mol Biol Evol, 2000, 17:189-97) and PHYLIP 
(Felsenstein, J., Phylogeny Inference Package version 3.6, 1995, Department of Genetics, 
University of Washington, Seattle, WA). The accession codes of the 40 sequences from the 

to NCBI non-redundant database are: NPJXXH60, NP.038491, CAC44626, XP.318652, 
AAM29494, XPJ315871, NP.61IU9, AAL87527, XP_235515, NP_000253, 1KTB, 
NP.506031, NP.822650, NP_624613, AAC99325, NP821803, BAB83765, ZP_00066516, 
AAM13199, AAP04002, AAGI3536, BAC55816, NP_568193, CAC08337, Q42656, 
BAC66445, T06388, T10860, P14749, AAF04591, BAB12570, NP191 190, S45453, P41947, 

75 NP_595012, AAG2451 1, AAB35252, JC5558, NP_811977, and P28351. Sequence identities 
were calculated without signal sequences in EMBOSS using a Needleman-Wunsch full path 
matrix algorithm with the BLOSSUM62 matrix, a gap penalty of 10, and a gap extension 
penalty of 0.5 (Needleman, S. B. & Wunsch, C. D. f J Mol Biol 1970, 48:443-53). Least 
squares superpositions of coordinates were performed using the program LSQMAN 

20 (Kleyvvegt, G. J. & Read, R. J., Structure, 1997, 5:1557-1569) with a distance cutoff of 3.8A, 
and coordinate transformations were applied using the program MOLEMAN2 (Kleywegt, G. 
J. & Read, R. J., Structure, 1997, 5:1557-1569). Molecular figures were prepared using the 
programs MOLSCRIPT (Kraulis, P. J., J. Appl. Crystallogr., 1991, 24:946-950), 
BOBSCRIPT (Esnouf, R. M., 1 Mol Graph. Model 1997, 15:132-34), and GRASP 

25 (Nicholls, A., et al., Proteins 1991, 1 1:281-96). 

Results 

The structure of human a-GAL was determined by x-ray crystallographic methods to 
a resolution limit of 3.25 A (see Table 1 below). 



30 
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Table 1: Crystalloqraphic Statistics 

Data 





Native 


Ligand 


Beamline 


APS 22-ED 


APS 22-BD 


Wavelength, A 


1.033 


1.033 


Space Group 


F3 2 21 


P3 2 21 


Cell Lengths, A 


88.5, 88.5,215.5 


90.0, 90.0,216.5 


Resolution, A (last shell) 


50-3.25 (3.37-3.25) 


50-3.45 (3.57-3.45) 


No. of Observations (last shell) 


156309 (9921) 


91651 (8610) 


No. of Unique Observations (last shell) 


16080(1542) 


13922(1323) 


Completeness, % (last shell) 


99.8 (98.7) 


99.7 (98.9) 


Multiplicity (last shell) 


9.7 (6.5) 


6.6 (6.7) 


Rs>™ (last shell) 


0.246 (0.740) 


0.200 (0.745) 


<Vo\> (last shell) 


9.1 (2.4) 


8.2 (2.4) 


Refinement 






Rwork / Rfiree 


26.2%/ 30.1% 


28.5%/ 32.1% 


No. of Atoms: Protein 


6251 


6251 


Carbohydrate 


268 


331 


Other 


18 


18 


Ramachandran: Favored 


74.4% 


74.3% 


Allowed 


23.0% 


23.8% 


Generous 


2.5% 


1.5% 


Forbidden 


0% 


0.4% 


RMS Deviations: Bonds 


0.009 A 


0.008 A 


Angles 


1.5° 


1.5° 


Dihedrals 


22.8° 


22.8° 


I m propers 


0.9° 


0.8° . 



Ri>Tn=^b^i|lh.i- <, h>|/^h^i|fh.i| 1 where Ihj is the i intensity measurement of reflection h and <lh> is the average iniensity 
of that reflection. 

Ru-ork / RifK=S| 1 |Fp-F c |/I|JF P | f where F c is ihe calculated and F P is ihc observed structure facior amplitude of reflection h fo 
the working/free set, respectively. 



The x-ray structure reveals human a-GAL as a homodimeric glycoprotein with each 
monomer composed of two domains, a (P/a)g domain containing the active site and a C- 

5 terminal domain containing eight antiparallel P strands on two sheets in a P sandwich (FIG. 
6a). After removal of the 31 residue signal sequence, the first domain extends from residues 
32 to 330 and contains the active site formed by the C-terminal ends of the P strands at the 
center of barrel, a typical location for the active site in (P/cOs domains. The second domain, 
comprised of residues 331 to 429, packs against the first with an extensive interface, burying 

w 2500 A 2 of surface area within one monomer. The dimer has overall protein dimensions of 
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approximately 75 x 75 x 50A (FIG. 6b). The molecule is concave in the third dimension and 
varies in thickness from approximately 20 to 50A (FIG. 6c). Electron density is visible for 
390 and 391 amino acid residues (out of 398 total) in the two copies of the monomer in the 
crystallographic asymmetric unit; the missing residues occur at the C-terminus. The two 
5 monomers pack with an interface that extends the 75A width of the dimer and buries 2200 A 2 
of surface area. In the dimer interface, 30 residues from each monomer contribute to the 
interface, from loops (3l-al, (36-a6, (37-a7, (38-a8, pi 1-012, and Pl5-pl6. The dimer is 
markedly negatively charged, as seen in a surface electrostatic potential (FIG. 6d). With 47 
carboxylate groups and only 36 basic residues in the 398 residues in the molecule, the overall 

10 charge per monomer is expected to be -11 at neutral pH. The carboxylates are most 
concentrated around the active site, but in the low pH of the lysosome, many of these groups 
become protonated, reducing the charge on the molecule. In addition to the negative charges 
on the protein, the N-linked carbohydrate is highly phosphorylated and sialylated (Lee, K., et 
al., Giycobiology, 2003, 13:305-13), further increasing its negative electrostatic potential. The 

is N-linked carbohydrates fall distal to the active sites (FIG. 6d). Each monomer contains the 
three N-linked carbohydrate sites, five disulfide bonds (C52-C94, C56-C63, C142-C172, 
C202-C223, and C378-C382), two unpaired cysteines (C90 and CI 74), and three cis prolines 
(P210, P380, and P389). 

As mentioned above, the C-terminal seven and eight residues of each chain have no 

20 electron density associated with them and are presumably disordered. This disorder is 
consistent with the observation of slight heterogeneity in the C-terminus of recombinant 
human a-GAL, where the truncation of one or two residues from the C-terminus can occur 
but has no effect upon the activity of the enzyme (Lee, K., et al., Giycobiology, 2003, 13:305- 
13). The structure offers no support for the observation that the removal of 2 to 10 residues 

25 from the C-terminus increases the activity of a-GAL (Miyamura, N., et al., J Clin Invest, 
1996, 98:1809-17), because the final residue seen in the structure falls at least 45 A from each 
active site and on the opposite face of the molecule. 

Substrate specificity and catalytic mechanism 
30 In both the native and galactose-soaked crystal structures, electron density appears in 

the two crystallographically-independent active sites (FIGS. 8a and b). In the galactose- 
soaked crystal, this density represents ct-galactose, the normal catalytic product of the 
enzyme (K\ -ImM). In the native structure, this density most likely derives from the 
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cryoprotectant ethylene glycol, a weak inhibitor of glycoside hydrolases (Tsitsanou, K. E. t et 
al. t Protein Sci, 1999, 8:741-9), analogous to the insertion of glycerol into carbohydrate 
binding sites on proteins (Garman, S. C, et al., Structure, 2002, 10:425-434; Tsitsanou, K. E., 
et al., Protein Sci, 1999, 8:741-9; Schmidt, A., et al., Protein Sci, 1998, 7:2081-8). The two 
5 active sites of the dimer are separated by approximately 50 A. As the enzyme shows little 
change between the liganded and unliganded structures, there is no evidence for cooperativity 
between the two sites, although the biochemical evidence is mixed (Dean, K. J. & Sweeley, 
C. C, J Biol Chem, 1979, 254:9994-10000; Bishop, D. F. & Desnick, R. J., J Biol Chem, 
1981,256:1307-16). 

to We have determined that human a-GAL binds a-galactose by making specific 

contacts to each functional group on the monosaccharide. Residues from seven loops in 
domain I form the active site: pl-al, p2-a2, p3-a3, |34-a4, P5-a5, p6-a6, and P7-a7. The 
active site is formed by the side chains of residues W47, D92, D93, Y134, C142, K168, 
D170, E203, L206, Y207, R227, D231, D266, and M267. Thus, a binding pocket defined by 

15 the structural coordinates of these amino acids, as set forth in FIG. 1; or a binding pocket 
whose root mean square deviation from the structure coordinates of the backbone atoms of 
these amino acids is not more than 1.5 A is considered a human a-GAL-like binding pocket 
of this invention. In important embodiments, C172 makes a disulfide bond to C 142. 

In the a-GAL/ot-NAGAL family, specificity for the 2 position on the galactose ligand 

20 occurs via the p5-ot5 loop. This was called the "N-acetyl recognition loop" in a-NAGAL 
(Garman, S. C., et al., Structure, 2002, 10:425-434); in the overall a-GAL/a-NAGAL family 
"2 position recognition loop" or "2 loop" is appropriate. This loop falls near the boundary of 
exons 4 and 5 of animal ot-GAL/a-NAGAL, which have a small insertion in this region, 
resulting in a short helical stretch at the top of the 35 strand; this insertion is absent in other 

25 species. Plant and fungal ot-GALs use a Cys and a Trp on this loop to coordinate the 2- 
hydroxyl on galactose; animal a-GAL uses a Glu and a Leu to recognize the 2-hydroxyl 
(FIG. 7, green) while animal a-NAGAL uses a Ser and an Ala to recognize an N-acetyl at the 
2 position (FIG. 7, yellow). In the animal enzymes, the larger Glu and Leu side chains 
sterically block the larger N-acetyl substituent, while the smaller Ser and Ala side chains 

30 nicely accommodate an N-acetyl group and tolerate a hydroxyl group. 

With three different conformations in the 2 loop now identified, the substrate 
specificity of the other members of the family can be categorized by homology. For 
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example, genome sequencing of Drosopliiia melanogaster and Anopheles gambiae have each 
identified pairs of genes in the a-GAL family. By examination of the sequences in the 2 loop, 
two are clearly a-NAGALs while the other two appear to be a-GALs (FIG. 7, yellow and 
purple). Surprisingly, Aspergillus niger contains an enzyme identified as a-GAL that, 

5 although only 30% identical to the animal protein sequences, contains a 2 loop virtually 
identical to animal a-NAGALs (FIG. 7, yellow). We predict this enzyme is primarily an a- 
NAGAL with partial a-GAL activity, much like human a-NAGAL, which was originally 
thought to be an a-GAL based upon similar activity (Dean, K. J., et al., Biochem. Biophys. 
Res. Commun., 1977, 77:1411-7; Schram, A. W., et al., Biochim. Biophys. Acta, 1977, 

W 482:138-44). 

Although human a-GAL makes contacts to each functional group on the a-galactose 
ligand, the enzyme shows little specificity for the distal portion of the substrate beyond the 
glycosidic linkage, and the active site cleft is found in a broad opening on the concave 
surface of the enzyme (FIG. 6c). The lack of substrate specificity of human a-GAL beyond 

/.5 the terminal a-galactose differs slightly from the specificity of other a-GALs, which act only 
upon substrates containing terminal al-6 galactose groups (Kim,W.D., et al., Pliytochemistry, 
2002, 61:621-30). This increased specificity of plant a-GALs may derive from their 
monomeric structure, as residues buried in the dimer interface of animal a-GALs (e.g., those 
on the pi-al loop - Fujimoto, Z., et al., J Biol Chem, 2003, 278:20313-8) are available for 

20 ligand recognition in monomeric a-GALs. 

Both a-GALs and a-NAGALs are a retaining exoglycosidases, where both the 
substrate and product of the catalytic reactions are a anomers at the 1 position on the 
galactose ring. This retention of anomeric configuration is accomplished by a double 
displacement catalytic mechanism where the anomeric carbon undergoes two successive 

25 nucleophilic attacks (Vasella, A., et al., Curr Opin Chem Biol, 2002, 6:619-29). The two 
sequential inversions of the anomeric carbon lead to retention of the configuration at the end 
of the catalytic cycle. In two a-GALs from different species, peptic digestion of covalently 
trapped intermediates has identified the specific aspartic acid acting as the catalytic 
nucleophile (Hart, D. O., et al., Biochemistry, 2000, 39:9826-36; Ly, H. D., et al., Carbohydr. 

30 Res., 2000, 329:539-47). These data, combined with the high resolution structure of chicken 
a-NAGAL, predict the catalytic mechanism of human a-GAL. In human a-GAL, the first 
nucleophilic attack upon the substrate comes from DI70, cleaving the glycosidic linkage and 



-30- 

leading to a covalent enzyme-intermediate complex. In the second step of the reaction, a 
water molecule (deprotonated by D231) attacks CI of the covalent intermediate, liberating 
the second half of the catalytic product and regenerating the enzyme in its initial state. 
Human a-GAL operates most efficiently at low pH, consistent with its highly acidic 

5 composition and its lysosomal location. 

Retaining glycosidases typically have distances of 5-6A between catalytic 
carboxylates, while inverting glycosidases typically have distances of 9-1 lA between these 
residues (McCarter, J. D. & Withers, S. C, Curr. Opin. Struct. Biol 1994, 4:885-92). From 
these distances, it has been possible to reliably predict the mechanism and function of a 

10 glycosidase given its structure. However, this rule must be reconsidered in light of the new 
structures in the a-GAL/a-NAGAL family: for the known structures in the family, the closest 
approach of the two catalytic carboxylates is 6.5-7A, among the largest distances seen for 
retaining glycosidases. 

/.5 Comparison to related molecules 

Human a-GAL is most closely related to a-NAGAL, with the human enzymes 
sharing 49% amino acid sequence identity. A phylogeny tree (FIG. 7) of the 40 proteins most 
closely related to human a-GAL reveals that vertebrate a-GAL and a-NAGAL cluster and 
have evolved from a common precursor (Wang, A. M., et al., J. Biol. Chem., 1990, 

20 265:21859-66; Wang, A. M., et al., Mol Genet. Metab., 1998, 65:165-73), while plant and 
other a-GALs segregate into distinct clusters. The 40 proteins share from 32 to 78% amino 
acid sequence identity with human a-GAL, with the sequence conservation higher in domain 
1, particularly among residues forming the active site. 

The 40 sequences include two structures of a family of 27 glycoside hydrolases: 

25 human a-GAL and chicken a-NAGAL (Garman, S. C, et al., Structure, 2002, 10:425-434) 
(51% amino acid identity with human a-GAL). Both enzymes share common tertiary 
structures: each monomer contains both a (P/a) 8 N-terminal domain and an antiparallel (J C- 
terminal domain. The N-terminal domains superimpose very well: the chicken a-NAGAL 
superimposes on the human a-GAL with a root mean square deviation (RMSD) of 0.7A for 

30 295 Ca atoms. Domain 2, with lower sequence conservation, superimposes less well: the 
chicken domain superimposes on human with an RMSD of 1.3A for 80 Ca atoms. The most 
important residue in the dimer interface, F273, has 130A 2 surface area buried per monomer 



-31- 

upon formation of the dimer. This residue alone (out of the 30 in the dimer interface) 
accounts for 12% of the buried surface area in the interface. This residue is a Phe or Tyr in 
most animal oc-GALs and a-NAGALs,. while in plant ct-GALs, the equivalent residue is a 
Gly. Thus, this residue predicts the dimerization state of the enzyme in different species: Phe 
5 or Tyr indicates the enzyme is a dimer, while Gly indicates the enzyme remains a monomer. 

N-linked carbohydrate and lysosomal targeting 

Both endogenous and recombinant a-GAL show a large amount of heterogeneity in 
the attached carbohydrate, with over 70 different glycoforms (Lee, K., et al., Glycobiology, 

to 2003, 13:305-13; Bishop, D. F. & Desnick, R. J., J Biol Cliem, 1981, 256:1307-16; Matsuura, 
F., et al., Glycobiology, 1998, 8:329-39; LeDonne, N. C,, et al. Arch Biochem Biophys, 1983, 
224:186-95; Ioannou, Y. A., et al., Biochem 1998, 332:789-97). Despite the resolution of 
the human a-GAL structure, extensive density appears for N-Iinked carbohydrates. Each 
monomer has four potential N-linked carbohydrate attachment sites (N139, N192, N215, and 

15 N408), the first three of which show carbohydrate electron density. The fourth potential site 
at N408 contains the amino acid sequence Asn-Pro-Thr, a sequence not ordinarily recognized 
by the carbohydrate attachment machinery (Gavel, Y. & von Heijne, G., Protein Eng. 1990, 
3:433-42), consistent with the absence of carbohydrate at this location in recombinant a-GAL 
expressed in COS cells (Ioannou, Y. A, et al, Biochem J. t 1998, 332:789-97), CHO cells and 

20 human cells (Lee, K., et al, Glycobiology, 2003, 13:305-13). The three sites with attached 
carbohydrate show density in both independent monomers in the asymmetric unit and in both 
the native and ligand-soaked crystals. Electron density for the carbohydrate attached to N 192 
is shown in FIG. 9. 

The glycosylation pattern differs among the structures in the a-GAL/ct-NAGAL 
25 family. The chicken a-NAGAL and human a-GAL each contain three sites, two of which 
(NI92 and N215 in a-GAL numbering) are in common. These two carbohydrates are 
attached to helices a4 and ct5, away from the active site and from the dimer interface. The re- 
linked carbohydrate at N215 is necessary but not sufficient for successful secretion of the 
active enzyme, and the N192 carbohydrate site improves secretion of the active enzyme 
30 (Ioannou, Y. A, et al., Biochem J., 1998, 332:789-97). These two sites have a large 
proportion of oligomannose-containing carbohydrate, while the N139 site contains no 
oligomannosyl carbohydrate, only complex carbohydrate (Lee, K, et al, Glycobiology, 2003, 
13:305-13). Thus. the N-linked carbohydrate at N192 and N215 is responsible for targeting 
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the glycoprotein to the lysosome, because only oligomannosyl carbohydrates contain the 
lysosomal targeting signal, mannose-6-phosphate (Ghosh, P., et al., Nat Rev Mol Cell Biol, 
2003, 4:20242). The N192 and N215 side chains are 20A apart on the same face of the 
molecule, 24 and 23A away from the active site respectively (FIG. 6d). Unlike many N- 
.5 linked carbohydrates that lie along the surface of the protein and shield surface-exposed 
hydrophobic residues, the carbohydrate at N215 extends away from the protein, in an ideal 
position to bind to the mannose-6-phosphate receptor (M6PR). Mutation of N215 to Ser 
eliminates the carbohydrate attachment site, causing inefficient trafficking of the enzyme to 
the lysosome (Ioannou, Y. A., et al., Biochem 1998, 332:789-97) and leading ultimately to 

10 the development of Fabry disease (Davies, J. P., et al., Hum Mol Genet, 1993, 2:1051-3). 
Unique among the carbohydrate attachment sites, N215 shows different primary glycoforms 
in the two recombinant enzymes used as Fabry disease treatments: in Replagal this site is 
mostly singly phosphorylated oligomannose, while in Fabrazyme this site is mostly 
biphosphorylated oligomannose (Lee, K., et al., Glycobiology, 2003, 13:305-13). The M6PR 

15 transport pathway is also used by the recombinant glycoprotein in the treatment for Fabry 
disease: upon injection into the bloodstream of a Fabry patient, the recombinant glycoprotein 
is delivered into the lysosomes of affected cells via M6PR on the surface. The 
pharmacological differences between the Replagal and Fabrazyme a-GAL preparations 
derive from the different glycoforms attached to N 192 and N215. 

20 

Detailed Description of the Drawings 

Figure 1. Atomic structure coordinates of human a-GAL 

Figure 1A through 1Z list the atomic structure coordinates for human a-GAL as 
derived by X-ray diffraction from a crystal of human a-GAL. The following abbreviations 
25 are used in FIG. 1: "Atom type" refers to the element whose coordinates are measured. The 
first letter in the column defines the element. 

"X, Y, Z" crystallographically define the atomic position of the element measured. 
"OCC" is an occupancy factor that refers to the fraction of the molecules in which 
each atom occupies the position specified by the coordinates. A value of 44 1" indicates that 
30 each atom has the same conformation, i.e., the same position, in all molecules of the crystal. 

is a thermal factor that measures movement of the atom around its atomic center. 
Figure 2. Computer Diagram 
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Computer used to generate a three-dimensional graphical representation of a molecule 
or molecular complex according to this invention. 
Figure 3. Cross section of a magnetic storage medium. 
Figure 4. Cross section of an optically- readable data storage medium. 
5 Figure 5. Tlie reaction catalyzed by a-GAL 

(a) The general reaction of a-GAL. A terminal galactose in the a anomeric 
configuration is cleaved from an oligosaccharide, glycoprotein, or glycolipid, producing ot- 
galactose (Gal(al)) and an alcohol. The carbons are numbered on a-galactose. (b) a-GAL 
and Fabry disease. The Fabry disease substrate globotriaosylceramide is cleaved by a-GAL 
to to form lactosylceramide. In the absence of the functional enzyme, globotriaosylceramide 
accumulates in the tissues. 
Figure 6. The structure of a-GAL 

(a) The a-GAL monomer. The monomer is colored from N (blue) to C terminus (red). 
Domain 1 contains the active site at the center of the P strands in the (P/a)g barrel, while 

15 domain 2 contains antiparallel P strands. The galactose ligand is shown in yellow and red 
CPK atoms, (b) and (c) Two views of the a-GAL dimer. The ribbon and ligand are colored as 
in (a). The active sites are 50A apart in the dimer, on the concave surface of the molecule as 
viewed from the side in (c). (d) The surface of a-GAL. Two views of the molecular surface 
are shown with a probe radius of 1.4A, with the electrostatic surface potential plotted from - 

20 lOkT (red) to +10kT (blue). The N-linked carbohydrate is shown in green and is not included 
in the surface potential calculation. The orientation at left is similar to (b). 
Figure 7. Evolutionary relationships in the a-GAL/a-NAGAL family 

A phylogeny tree demonstrates the relationships of 40 sequences most closely related to 
human a-GAL. The length of the line connecting each name represents the distance between 
25 the two sequences. The sequences above the black line have an insertion creating a turn of 
helix in the P5/a5 loop, while the lower sequences lack this insertion. a-NAGALs are in 
yellow, while a-GALs are in green, blue and purple. 
Figure 8. Vie active site of a-GAL 

(a) and (b) Electron density in human a-GAL from native and galactose-soaked crystals. The 
30 electron density is shown in stereo contoured at 1.1a from a o A -weighted simulated 
annealing composite omit map, with side chains from active site residues colored as in fig 6. 
The red density does not derive from the protein and is interpreted as an ethylene glycol 
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molecule in (a) and the catalytic product galactose in (b). In (c) the superimposed active sites 
of human a-GAL (green), and chicken a-NAGAL (yellow) are shown in stereo. The (35-a5 
loop that differs among the two structures appears at lower right. 
Figure 9. N-linked carbohydrate 
5 The N-linked carbohydrate attached to N192 is shown with helix 0t4. Electron density from a 
OA-weighted simulated annealing composite omit map (grey) is contoured at l.lo. Five sugar 
residues have been built into the electron density at this site. 
Figure 10. Vie active site of a-GAL 

A schematic representation of the human a-GAL active site with a galactose molecule buried 
10 within. 

Equivalents 

Those skilled in the art will recognize, or be able to ascertain using no more than 
routine experimentation, many equivalents to the specific embodiments of (he invention 
/5 described herein. Such equivalents are intended to be encompassed by the following claims. 

All references disclosed herein are incorporated by reference in their entirety. What is 
claimed is presented below and is followed by a Sequence Listing. 

We claim: 
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Claims 

1 . A computer for producing a three-dimensional representation of: 

a. a molecule or molecular complex, wherein said molecule or molecular 
complex comprises a binding pocket defined by structure coordinates of human a- 
galactosidase amino acids W47, D92, D93, Y 134, CI 42, K168, D170 ? E203, L206, 
Y207, R227, D23 1 , D266, and M267, according to FIG. 1 ; or 

b. a homologue of said molecule or molecular complex, wherein said 
homologue comprises a binding pocket that has a root mean square deviation from the 
backbone atoms of said amino acids of not more than 1.5 A, wherein said computer 
comprises: 

(i) a computer-readable data storage medium comprising a data storage material 
encoded with computer-readable data, wherein said data comprises the structure coordinates 
of human a-galactosidase amino acids W47, D92, D93, Y134, C142, K168, D170, E203, 
L206, Y207, R227, D231, D266, and M267, according to FIG. 1; 

(ii) a working memory for storing instructions for processing said computer-readable 

data; 

(iii) a central-processing unit coupled to said working memory and to said computer- 
readable data storage medium for processing said computer-machine readable data into said 
three-dimensional representation; and 

(iv) a display coupled to said central-processing unit for displaying said three- 
dimensional representation. 

2. The computer according to claim 1, wherein the computer produces a three- 
dimensional representation of: 

a. a molecule or molecular complex defined by structure coordinates of all of the 
human a-galactosidase amino acids set forth in FIG. 1, or 
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b. a homologue of said molecule or molecular complex, wherein said homologue 
comprises a binding pocket that has a root mean square deviation from the backbone atoms of 
said amino acids of not more than I .5 A; and 

wherein said computer readable data contains the coordinates of all of the human ot- 
galactosidase amino acids set forth in FIG. 1. 

3. A computer for determining at least a portion of the structure coordinates 
corresponding to X-ray diffraction data obtained from a molecule or molecular complex, 
wherein said computer comprises: 

(a) a computer-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises at least a portion of the 
structural coordinates of human ot-galactosidase according to FIG. I ; 

(b) a computer-readable data storage medium comprising a data storage material 
encoded with computer-readable data, wherein said data comprises X-ray diffraction data 
obtained from said molecule or molecular complex; 

(c) a working memory for storing instructions for processing said computer-readable 
data of (a) and (b); 

(d) a central-processing unit coupled to said working memory and to said computer- 
readable data storage medium of (a) and (b) for performing a Fourier transform of the 
machine readable data of (a) and for processing said computer-readable data of (b) into 
structure coordinates; and 

(e) a display coupled to said central-processing unit for displaying said structure 
coordinates of said molecule or molecular complex. 

4. The computer according to claim 3, wherein said molecule or molecular complex 
comprises a polypeptide having ot-galactosidase activity. 
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5. A method for evaluating the potential of a chemical entity to associate with: 

a) a molecule or molecular complex comprising a binding pocket defined by structure 
coordinates of human a-galactosidase amino acids W47 t D92, D93, Y134, C142, K168, 
D170, E203, L206, Y207, R227, D231, D266, and M267, according to FIG. 1, or 

b) a homologue of said molecule or molecular complex, wherein said homologue 
comprises a binding pocket that has a root mean square deviation from the backbone atoms of 
said amino acids of not more than 1 .5 A comprising the steps of: 

i) employing computational means to perform a fitting operation between the 
chemical entity and a binding pocket defined by structure coordinates of human a- 
galactosidase amino acids W47, D92, D93, Y134, C142, K168, D170, E203, L206, Y207, 
R227, D231, D266, and M267, according to FIG. 1 ± a root mean square deviation from the 
backbone atoms of said amino acids of not more than 1.5 A; and 

ii) analyzing the results of said fitting operation to quantify the association between 
the chemical entity and the binding pocket. 

6. The method according to claim 5, wherein the method evaluates the potential of a 
chemical entity to associate with: 

a. defined by structure coordinates of all of the human a-galactosidase amino acids, as 
set forth in FIG. I, or 

b. a homologue of said molecule or molecular complex having a root mean square 
deviation from the backbone atoms of said amino acids of not more than 1 .5 A. 

7. A method for identifying a potential agonist or antagonist of a molecule comprising a 
human a-galactosidase domain 1-like binding pocket comprising the steps of: 



a. using the atomic coordinates of W47, D92, D93, Y134, C142, K168, D170, E203, 
L206, Y207, R227, D231, D266, and M267, according to FIG. 1 ± a root mean square 
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deviation from the backbone atoms of said amino acids of not more than 1 .5 A, to generate a 
three-dimensional structure of molecule comprising a human a-galactosidase domain 1 -like 
binding pockei; 

b. employing said three-dimensional structure to design or select said potential agonist 
or antagonist; 

c. synthesizing said agonist or antagonist; and 

d. contacting said agonist or antagonist with said molecule to determine the ability of 
said potential agonist or antagonist to interact with said molecule. 

8. The method according to claim 7, wherein in step a., the atomic coordinates of all the 
amino acids of human a-galactosidase according to FIG. 1 ± a root mean square deviation 
from the backbone atoms of said amino acids of not more than 1 .5 A are used. 
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Abstract 

This invention pertains to the X-ray crystal structure of the human a-galactosidase 
glycoprotein. More specifically, the invention relates to crystallized compositions of human 
a-galactosidase and to crystallized complexes of human a-galactosidase and its catalytic 
product a-galactose. The invention further relates to a computer programmed with the 
structure coordinates of the human a-galactosidase* s active site wherein said computer is 
capable of displaying a three-dimensional representation of that active site. The invention 
also relates to methods for rational drug design based on the structural data for human a- 
galactosidase provided on computer readable media, as analyzed on a computer system 
having suitable computer algorithms. 
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Ceramide "° 

GaKaMJGaKpi^JGIctpi-nCeramide Ga!(p14)Glc(pi-10Ceramide 
Globotriaosyfceramide Lactosylceramide 
(Fabry disease substrate) 
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APPLICATION DATA SHEET FORM 



Inventor Information 




Inventor One Given Name:: 


jCOU 


Family Name:: 


uarman 


Postal Address Line One:: 




City- 


Rockville 


State or Province:: 


Maryland 


Postal or Zip Code:: 


zUojU 


Citizenship Country:: 


T TO 

US 


Inventor Two Given Name:: 


David N. 


ramily Name:: 


uarboczi 


Postal Address Line One- 




City:: 


Gaithersburg 


State or Province- 


Maryland 


Postal or Zip Code:: 


zUo / / 


Citizenship Country- 


T TO 

US 


Inventor Three Given Name:: 


Richard F. 


r*» • i . x I ~ 

Family Name- 


Selden 


Postal Address Line One:: 




City:: 


Wellesley 


State or Province- 


Massachusetts 


Postal or Zip Code- 


02482 


Citizenship Country- 


US 


Inventor Four Given Name- 


Douglas A. 


Family Name- 


Treco 


Postal Address Line One:: 




City:: 


Arlington 


State or Province:: 


Massachusetts 


Postal or Zip Code- 


02476 


Citizenship Country- 


US 


Inventor Five Given Name- 


Michael W. 


Family Name:: 


Heart lei n 


Postal Address Line One- 




City:: 


Boxborough 


State or Province- 


Massachusetts 


Postal or Zip Code:: 


01719 


Citizenship Country:: 


US 


Inventor Six Given Name:: 


Marianne 


Family Name- 


Borowski 


Postal Address Line One:: 




612821! 





Application Data Sheet Form 



Page 2 



City- 
State or Province- 
Postal or Zip Code- 
Citizenship Country- 



Correspondence Information 

Name Line One- 
Name Line Two:: 
Address Line One: 
Address Line Two:: 
City:: 

State or Province: : 
Country: : 

Postal or Zip Code- 
Telephone One- 
Telephone Two- 
Fax Number: 
Electronic Mail- 



Application Information 

Title Line One- 
Total Specification Sheets w/Claims:: 
Total Drawing Sheets- 
Sequence Listing Sheets:: 
Claims:: 

Application Type- 
Docket Number- 
Date of deposit- 
Express Mail No.:: 

Representative Information 

Name Line One- 
Name Line Two:: 
Address Line One: 
Address Line Two- 
City:: 

State or Province- 
Country: : 

Postal or Zip Code- 
Telephone One- 



Glen 

New Hampshire 

03838 

US 



Konstantinos Andrikopoulos, J.D., Ph.D. 
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Telephone Two:: 
Fax Number: 
Electronic Mail:: 



617-349-0200 
617-613-4020 



kandrikopoulos@tktx.com 



Representative Customer Number 

Continuity Information 

Prior Foreign Applications 

Foreign Application One- 
Filing Date:: 
Country:: 
Priority Claimed:: 



-1- 

SEQUENCE LISTING 

<110> National Institute of Allergy and Infectious Diseases, NIH 
Transkaryotic Therapies, Inc. 
Garman, Scott C. 
Garboczi, David N. 
Selden, Richard F. 
Treco, Douglas A. 
Heartlein, Michael W. 
Borowski, Marianne 

<120> CRYSTAL STRUCTURE OF HUMAN AL PHA - G AL ACTO S I DAS E 

<130> 0402 

<160> 2 

<170> Patentln version 3.2 

<210> 1 

<211> 1290 

<212> DNA 

<213> Homo sapiens 

<400> 1 



atgcagctga 


ggaacccaga 


actacatctg 


qqctqcacqc 


ttgcgcttcg 


cttcctggcc 


60 


ctcg 1 1 tcct 


gggaca t ccc 


taoaoctaaa 








120 


accatgggct 


ggctgcactg 


ggagcgcttc 


atgtgcaacc 


ttgactgcca 


ggaagagcca 


180 


gattcctgca 


tcagtgagaa 


gctct tcatg 


gagatggcag 


agctcatggt 


ctcagaaggc 


240 


tggaaggatg 


caggttatga 


gtacctctgc 


attgatgact 


gttggatggc 


tccccaaaga 


300 


gattcagaag 


gcagacttca 


ggcagaccct 


cagcgctttc 


ctcatgggat 


tcgccagcta 


360 


gctaattatg 


ttcacagcaa 


aggactgaag 


ctagggattt 


atgcagatgt 


tggaaataaa 


420 


acctgcgcag 


gcttccctgg 


gagttttgga 


tactacgaca 


ttgatgccca 


gacctttgct 


480 


gactggggag 


tagatctgct 


aaaatttgat 


ggttgttact 


gtgacagttt 


ggaaaatttg 


540 


gcagatggtt 


ataagcacat 


gtccttggcc 


ctgaatagga 


ctggcagaag 


cattgtgtac 


600 


tcctgtgagt 


ggcctcttta 


tatgtggccc 


tttcaaaagc 


ccaattatac 


agaaatccga 


660 


cagtactgca 


atcactggcg 


aaat tttgct 


gacattgatg 


attcctggaa 


aagtataaag 


720 


agtatcttgg 


actggacatc 


ttttaaccag 


gagagaattg 


ttgatgttgc 


tggaccaggg 


780 


ggttggaatg 


acccagatat 


gttagtgatt 


ggcaactttg 


gcctcagctg 


gaatcagcaa 


840 


gtaactcaga 


tggccctctg 


ggctatcatg 


gctgctcctt 


tattcatgtc 


taatgacctc 


900 


cgacacatca 


gccctcaagc 


caaagctctc 


cttcaggata 


aggacgtaat 


tgccatcaat 


960 


caggacccct 


tgggcaagca 


agggtaccag 


cttagacagg 


gagacaactt 


tgaagtgtgg 


1020 


gaacgacctc 


tctcacgctt 


agcctgggct 


gtagctatga 


taaaccggca 


ggagattggt 


1080 


ggacctcgct 


cttataccat 


cgcagttgct 


tccctgggta 


aaggagtggc 


ctgtaatcct 


1140 


gcctgcttca 


tcacacagct 


cctccctgtg 


aaaaggaagc 


tagggttcta 


tgaatggact 


1200 


tcaaggttaa 


gaagtcacat 


aaatcccaca 


ggcactgttc 


tgcctcagct 


agaaaacaca 


1260 


atgcagatgt 


cat taaaaga 


cttactttaa 








1290 



<210> 2 

<211> 429 

<212> PRT 

<213> Homo sapiens 



<400> 2 



Met Gin Leu Arg Asn Pro Glu Leu His Leu Gly Cys Ala Leu Ala Leu 
15 10 15 



Arg Phe Leu Ala Leu Val Ser Trp Asp lie Pro Gly Ala Arg Ala Leu 
20 25 30 



Asp Asn Gly Leu Ala Arg Thr Pro Thr Met Gly Trp Leu His Trp Glu 
35 40 45 



Arg Phe Met Cys Asn Leu Asp Cys Gin Glu Glu Pro Asp Ser Cys lie 
50 55 60 



Ser Glu Lys Leu Phe Met Glu Met Ala Glu Leu Met Val Ser Glu Gly 
65 70 75 80 



Trp Lys Asp Ala Gly Tyr Glu Tyr Leu Cys lie Asp Asp Cys Trp Met 
85 90 95 



Ala Pro Gin Arg Asp Ser Glu Gly Arg Leu Gin Ala Asp Pro Gin Arg 
100 105 110 



Phe Pro His Gly lie Arg Gin Leu Ala Asn Tyr Val His Ser Lys Gly 
115 120 125 



Leu Lys Leu Gly He Tyr Ala Asp Val Gly Asn Lys Thr Cys Ala Gly 
130 135 140 



Phe Pro Gly Ser Phe Gly Tyr Tyr Asp He Asp Ala Gin Thr Phe Ala 
145 150 155 160 



Asp Trp Gly Val Asp Leu Leu Lys Phe Asp Gly Cys Tyr Cys Asp Ser 
165 170 175 



Leu Glu Asn Leu Ala Asp Gly Tyr Lys His Met Ser Leu Ala Leu Asn 
180 185 190 



Arg Thr Gly Arg Ser He Val Tyr Ser Cys Glu Trp Pro Leu Tyr Met 
195 200 205 



Trp Pro Phe Gin Lys Pro Asn Tyr Thr Glu He Arg Gin Tyr Cys Asn 
210 215 220 



His Trp Arg Asn Phe Ala Asp He Asp Asp Ser Trp Lys Ser He Lys 
225 230 235 240 



Ser He Leu Asp Trp Thr Ser Phe Asn Gin Glu Arg He Val Asp Val 
245 250 255 



Ala Gly Pro Gly Gly Trp Asn Asp Pro Asp Met Leu Val He Gly Asn 
260 265 270 



Phe Gly Leu Ser Trp Asn Gin Gin Val Thr Gin Met Ala Leu Trp Ala 
275 280 285 



He Met Ala Ala Pro Leu Phe Met Ser Asn Asp Leu Arg His He Ser 
290 295 300 



-3- 



Pro Gin Ala Lys Ala Leu Leu Gin Asp Lys Asp Val He Ala He Asn 
305 310 315 320 



Gin Asp Pro Leu Gly Lys Gin Gly Tyr Gin Leu Arg Gin Gly Asp Asn 
325 330 335 



Phe Glu Val Trp Glu Arg Pro Leu Ser Gly Leu Ala Trp Ala val Ala 
340 345 350 



Met He Asn Arg Gin Glu He Gly Gly Pro Arg Ser Tyr Thr He Ala 
355 360 365 



val Ala Ser Leu Gly Lys Gly Val Ala Cys Asn Pro Ala Cys Phe He 
370 375 380 



Thr Gin Leu Leu Pro Val Lys Arg Lys Leu Gly Phe Tyr Glu Trp Thr 
385 390 395 400 



Ser Arg Leu Arg Ser His He Asn Pro Thr Gly Thr Val Leu Leu Gin 
405 410 415 



Leu Glu Asn Thr Met Gin Met Ser Leu Lys Asp Leu Leu 
420 425 



